30 research outputs found
Bounding errors of Expectation-Propagation
Expectation Propagation is a very popular algorithm for variational
inference, but comes with few theoretical guarantees. In this article, we prove
that the approximation errors made by EP can be bounded. Our bounds have an
asymptotic interpretation in the number of datapoints, which allows us to
study EP's convergence with respect to the true posterior. In particular, we
show that EP converges at a rate of for the mean, up to
an order of magnitude faster than the traditional Gaussian approximation at the
mode. We also give similar asymptotic expansions for moments of order 2 to 4,
as well as excess Kullback-Leibler cost (defined as the additional KL cost
incurred by using EP rather than the ideal Gaussian approximation). All these
expansions highlight the superior convergence properties of EP. Our approach
for deriving those results is likely applicable to many similar approximate
inference methods. In addition, we introduce bounds on the moments of
log-concave distributions that may be of independent interest.Comment: Accepted and published at NIPS 201
The Poisson transform for unnormalised statistical models
Contrary to standard statistical models, unnormalised statistical models only
specify the likelihood function up to a constant. While such models are natural
and popular, the lack of normalisation makes inference much more difficult.
Here we show that inferring the parameters of a unnormalised model on a space
can be mapped onto an equivalent problem of estimating the intensity
of a Poisson point process on . The unnormalised statistical model now
specifies an intensity function that does not need to be normalised.
Effectively, the normalisation constant may now be inferred as just another
parameter, at no loss of information. The result can be extended to cover
non-IID models, which includes for example unnormalised models for sequences of
graphs (dynamical graphs), or for sequences of binary vectors. As a
consequence, we prove that unnormalised parameteric inference in non-IID models
can be turned into a semi-parametric estimation problem. Moreover, we show that
the noise-contrastive divergence of Gutmann & Hyv\"arinen (2012) can be
understood as an approximation of the Poisson transform, and extended to
non-IID settings. We use our results to fit spatial Markov chain models of eye
movements, where the Poisson transform allows us to turn a highly non-standard
model into vanilla semi-parametric logistic regression
Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference
ABC algorithms are notoriously expensive in computing time, as they require
simulating many complete artificial datasets from the model. We advocate in
this paper a "divide and conquer" approach to ABC, where we split the
likelihood into n factors, and combine in some way n "local" ABC approximations
of each factor. This has two advantages: (a) such an approach is typically much
faster than standard ABC and (b) it makes it possible to use local summary
statistics (i.e. summary statistics that depend only on the data-points that
correspond to a single factor), rather than global summary statistics (that
depend on the complete dataset). This greatly alleviates the bias introduced by
summary statistics, and even removes it entirely in situations where local
summary statistics are simply the identity function.
We focus on EP (Expectation-Propagation), a convenient and powerful way to
combine n local approximations into a global approximation. Compared to the EP-
ABC approach of Barthelm\'e and Chopin (2014), we present two variations, one
based on the parallel EP algorithm of Cseke and Heskes (2011), which has the
advantage of being implementable on a parallel architecture, and one version
which bridges the gap between standard EP and parallel EP. We illustrate our
approach with an expensive application of ABC, namely inference on spatial
extremes.Comment: To appear in the forthcoming Handbook of Approximate Bayesian
Computation (ABC), edited by S. Sisson, L. Fan, and M. Beaumon
Spectral properties of kernel matrices in the flat limit
Kernel matrices are of central importance to many applied fields. In this
manuscript, we focus on spectral properties of kernel matrices in the so-called
"flat limit", which occurs when points are close together relative to the scale
of the kernel. We establish asymptotic expressions for the determinants of the
kernel matrices, which we then leverage to obtain asymptotic expressions for
the main terms of the eigenvalues. Analyticity of the eigenprojectors yields
expressions for limiting eigenvectors, which are strongly tied to discrete
orthogonal polynomials. Both smooth and finitely smooth kernels are covered,
with stronger results available in the finite smoothness case.Comment: 40 pages, 8 page
Estimating the inverse trace using random forests on graphs
Some data analysis problems require the computation of (regularised) inverse
traces, i.e. quantities of the form \Tr (q \bI + \bL)^{-1}. For large
matrices, direct methods are unfeasible and one must resort to approximations,
for example using a conjugate gradient solver combined with Girard's trace
estimator (also known as Hutchinson's trace estimator). Here we describe an
unbiased estimator of the regularized inverse trace, based on Wilson's
algorithm, an algorithm that was initially designed to draw uniform spanning
trees in graphs. Our method is fast, easy to implement, and scales to very
large matrices. Its main drawback is that it is limited to diagonally dominant
matrices \bL.Comment: Submitted to GRETSI conferenc
Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes
Determinantal Point Processes (DPPs) are popular models for point processes
with repulsion. They appear in numerous contexts, from physics to graph theory,
and display appealing theoretical properties. On the more practical side of
things, since DPPs tend to select sets of points that are some distance apart
(repulsion), they have been advocated as a way of producing random subsets with
high diversity. DPPs come in two variants: fixed-size and varying-size. A
sample from a varying-size DPP is a subset of random cardinality, while in
fixed-size "-DPPs" the cardinality is fixed. The latter makes more sense in
many applications, but unfortunately their computational properties are less
attractive, since, among other things, inclusion probabilities are harder to
compute. In this work we show that as the size of the ground set grows,
-DPPs and DPPs become equivalent, meaning that their inclusion probabilities
converge. As a by-product, we obtain saddlepoint formulas for inclusion
probabilities in -DPPs. These turn out to be extremely accurate, and suffer
less from numerical difficulties than exact methods do. Our results also
suggest that -DPPs and DPPs also have equivalent maximum likelihood
estimators. Finally, we obtain results on asymptotic approximations of
elementary symmetric polynomials which may be of independent interest
Modelling fixation locations using spatial point processes
Whenever eye movements are measured, a central part of the analysis has to do
with where subjects fixate, and why they fixated where they fixated. To a first
approximation, a set of fixations can be viewed as a set of points in space:
this implies that fixations are spatial data and that the analysis of fixation
locations can be beneficially thought of as a spatial statistics problem. We
argue that thinking of fixation locations as arising from point processes is a
very fruitful framework for eye movement data, helping turn qualitative
questions into quantitative ones.
We provide a tutorial introduction to some of the main ideas of the field of
spatial statistics, focusing especially on spatial Poisson processes. We show
how point processes help relate image properties to fixation locations. In
particular we show how point processes naturally express the idea that image
features' predictability for fixations may vary from one image to another. We
review other methods of analysis used in the literature, show how they relate
to point process theory, and argue that thinking in terms of point processes
substantially extends the range of analyses that can be performed and clarify
their interpretation.Comment: Revised following peer revie